PMSGA: A Fast DNA Fragment Assembler

نویسندگان

  • Juho Mäkinen
  • Jorma Tarhio
  • Sami Khuri
چکیده

The DNA fragment assembly is an essential step in DNA sequencing projects. Since DNA sequencers output fragments, the original genome must be reconstructed from these small reads. In this paper, a new fragment assembly algorithm, Pattern Matching based String Graph Assembler (PMSGA), is presented. The algorithm uses multipattern matching to detect overlaps and a minimum cost flow algorithm to detect repeats. Special care was taken to reduce the algorithm’s run time without compromising the quality of the assembly. PMSGA was compared with well-known fragment assemblers. The algorithm is faster than other assemblers. PMSGA produced high quality assemblies with prokaryotic data sets. The results for eukaryotic data are comparable with other assemblers.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

MERmaid: A Parallel Genome Assembler for the Cloud

Modern genome sequencers are capable of producing millions to billions of short reads of DNA. Each new generation of genome sequencers is able to provide an order of magnitude more data than the previous, resulting in an exponential increase in required data processing throughput. The challenge today is to build a software genome assembler that is highly parallel, fast, and inexpensive to run. ...

متن کامل

An Eulerian path approach to DNA fragment assembly.

For the last 20 years, fragment assembly in DNA sequencing followed the "overlap-layout-consensus" paradigm that is used in all currently available assembly tools. Although this approach proved useful in assembling clones, it faces difficulties in genomic shotgun assembly. We abandon the classical "overlap-layout-consensus" approach in favor of a new euler algorithm that, for the first time, re...

متن کامل

DNA Paired Fragment Assembly Using Graph Theory

DNA fragment assembly requirements have generated an important computational problem created by their structure and the volume of data. Therefore, it is important to develop algorithms able to produce high-quality information that use computer resources efficiently. Such an algorithm, using graph theory, is introduced in the present article. We first determine the overlaps between DNA fragments...

متن کامل

Using Assembler Encoding to Solve Inverted Pendulum Problem

Assembler Encoding is Artificial Neural Network encoding method. To date, Assembler Encoding has been tested in two problems, i.e. in an optimization problem in which a solution is in the form of a matrix and in the so-called predatorprey problem in which the task of ANN is to control agent-predators whose common goal is to capture a fast moving agent-prey. The next problem in which Assembler E...

متن کامل

Determination of Material Flows in a Multi-echelon Assembly Supply Chain

This study aims to minimize the total cost of a four-echelon supply chain including suppliers, an assembler, distributers, and retailers. The total cost consists of purchasing raw materials from the suppliers by the assembler, assembling the final product, materials transportation from the suppliers to the assembler, product transportation from the assembler to the distributors, product transpo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010